Using a Swap Instruction to Coalesce Loads and Stores
نویسندگان
چکیده
A swap instruction, which exchanges a value in memory with a value of a register, is available on many architectures. The primary application of a swap instruction has been for process synchronization. In this paper we show that a swap instruction can often be used to coalesce loads and stores in a variety of applications. We describe the analysis necessary to detect opportunities to exploit a swap and the transformation required to coalesce a load and a store into a swap instruction. The results show that both the number of accesses to the memory system (data cache) and the number of executed instructions are reduced. In addition, the transformation reduces the register pressure by one register at the point the swap instruction is used, which sometimes enables other code-improving transformations to be performed.
منابع مشابه
Out-of-Order Memory Accesses Using a Load Wait Buffer
Many dynamic scheduling techniques take advantage of out-of-order instruction execution to hide memory access latency. However, as the disparity between processor and memory speeds increases, delays in the load-store queue become more of a bottleneck. One way to mitigate these delays is to allow loads and stores to execute and retire from the load-store queue (LSQ) out-oforder. Unfortunately, w...
متن کاملOn the Uncontended Complexity of Consensus
Lock-free algorithms are not required to guarantee a bound on the number of steps an operation takes under contention, so we cannot use the usual worst-case analysis to quantify them. A natural alternative is to consider the worst-case time complexity of operations executed in the more common uncontended case. Many state-of-the-art lock-free algorithms rely on compare-and-swap (CAS) or similar ...
متن کاملFranklin and Sohi : Arb - a Hardware Mechanism for Dynamic Reordering of Memory
To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references-especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardwa...
متن کاملARB: A Hardware Mechanism for Dynamic Reordering of Memory References
To exploit instruction level parallelism, it is important not only to execute multiple memory references per cycle, but also to reorder memory references, especially to execute loads before stores that precede them in the sequential instruction stream. To guarantee correctness of execution in such situations, memory reference addresses have to be disambiguated. This paper presents a novel hardw...
متن کاملAddress-free memory access based on program syntax correlation of loads and stores
An increasing cache latency in next-generation processors incurs profound performance impacts in spite of advanced out-of-order execution techniques. One way to circumvent this cache latency problem is to predict load values at the onset of pipeline execution by exploiting either the load value locality or the address correlation of stores and loads. In this paper, we describe a new load value ...
متن کامل